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We revisit the well-known problem of sorting under partial information: sort a finite set given the 
outcomes of comparisons between some pairs of elements. The input is a partially ordered set P, 
and solving the problem amounts to discovering an unknown linear extension of P, using pairwisc 
comparisons. The information-theoretic lower bound on the number of comparisons needed in 
the worst case is loge(P), the binary logarithm of the number of linear extensions of P. In a 
breakthrough paper, Jeff Kahn and Jeong Han Kim (J. Comput. System Set. 51 (3), 390-399, 
1995) showed that there exists a polynomial-time algorithm for the problem achieving this bound 
up to a constant factor. Their algorithm invokes the ellipsoid algorithm at each iteration for 
determining the next comparison, making it impractical. 

We develop efficient algorithms for sorting under partial information. Like Kahn and Kim, our 
approach relies on graph entropy. However, our algorithms differ in essential ways from theirs. 
Rather than resorting to convex programming for computing the entropy, we approximate the 
entropy, or make sure it is computed only once, in a restricted class of graphs, permitting the use 
of a simpler algorithm. Specifically, we present: 

(1) an 0(n 2 ) algorithm performing 0(logn • loge(P)) comparisons; 

(2) an 0(n 2 5 ) algorithm performing at most (1 + e) loge(P) + O s (n) comparisons; 

(3) an 0(n 2 5 ) algorithm performing 0(loge(P)) comparisons. 

All our algorithms can be implemented in such a way that their computational bottleneck is 
confined in a preprocessing phase, while the sorting phase is completed in O(q) + 0(n) time, 
where q denotes the number of comparisons performed. 

Categories and Subject Descriptors: F.2.2 [Analysis of Algorithms and Problem Complex- 
ity]: Nonnumcrical Algorithms and Problems; G.2.2 [Discrete Mathematics]: Graph Theory 

General Terms: Algorithms, Theory 

Additional Key Words and Phrases: Partial order, graph entropy 



1. INTRODUCTION 

Problem Definition. We consider the following problem: 

Let V — {vi, . . . ,v n } be a set equipped with an unknown linear order ^. Given 
a subset of the relations Vi ^ Vj , determine the complete linear order by queries of 
the form: "is Vi ^ Vj ?". 

This problem is called Sorting under Partial Information. We are given the out- 
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Fig. 1. An instance of the problem of sorting under partial information. In this example, we use 
4 comparisons (dashed edges). At every step, the Hasse diagram of the currently known partial 
order is shown. 

comes of a number of comparisons between elements of a linearly ordered set, and 
we wish to "complete the sort" by performing more comparisons. The partially 
ordered set (poset) P = (V, <p) encoding these known outcomes is a partial infor- 
mation that should help reducing the number of comparisons performed. Denoting 
by e(P) the number of linear extensions of P, it is obvious that the number of 
required comparisons is at least loge(P) in the worst case 5 . An example is given 
in Figure 1. 

Previous Results. The problem was first posed by Fredman [1976]. He showed 
that there exists an algorithm that performs log e(P) + 2n additional comparisons 
between elements of V. However, the number of comparisons performed by Fred- 
man's algorithm is not 0(loge(P)) when loge(P) is sub-linear, and deciding what 
comparisons should be done takes super-polynomial time. At that time, it remained 
open whether there existed, on the one hand, an algorithm performing <3(loge(P)) 
comparisons, and, on the other hand, an algorithm running in polynomial time. 

The first question was answered by Kahn and Saks [1984]. They showed that 
there always exists a query of the form "is Vi < Vj?" such that the fraction of 
linear extensions in which Vi is smaller than Vj lies in the interval (3/11,8/11). 
This is a relaxation of the well-known 1/3-2/3 conjecture, a conjecture formulated 
independently by Fredman, Linial, and Stanley, see [Linial 1984]. A simpler proof 
yielding weaker bounds was given by Kahn and Linial [1991]. Better bounds were 
later given by Brightwell et al. [1995]. Iteratively choosing such a comparison yields 
an algorithm that performs 0(loge(P)) comparisons. However, finding the right 
comparisons remained intractable. 

In 1995, Kahn and Kim published a breakthrough paper [Kahn and Kim 1995] 
in which they describe a polynomial-time algorithm performing 0(loge(P)) com- 
parisons, thus answering both questions positively. Their key insight is to relate 
loge(P) to the entropy of the incomparability graph of P, a quantity that can 
be computed in polynomial time. Their algorithm, although polynomial, is still 
far from practical because it uses the ellipsoid algorithm 0(loge(P)) = O(nlogn) 
times to determine the comparisons. 

Contribution. Our results are summarized in Table 1 below. 



5 Throughout the paper, logo; denotes the binary logarithm of x. 
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Algorithm 


Global complexity 


Number of comparisons 


[Kahn and Kim 1995] 


0(n log n ■ EA(n)) 


< 9.82 • loge(P) 


Algorithm 1 


0(n 2 ) 


0(logrt • loge(P)) 


Algorithm 2 


0(n 2 - 5 ) 


< (1 + e) log e(P) + £ (n) 


Algorithm 3 


0(n 2 - 5 ) 


< 15.09 ■ loge(P) 



Tabic I. Wc denote by EA(n) the time needed for the ellipsoid algorithm to compute the entropy 
of a poset of order n. The original bound given by Kahn and Kim on the number of comparisons 
performed by their algorithm is 54.45 • loge(P). The improved bound given in the table is a 
byproduct of our results. (The notation O e (n) means that the hidden constant may depend on 
e.) 

We now compare these results to those of Kahn and Kim (denoted: K&K). In 
terms of global complexity, each of our algorithms greatly improves over that of 
K&K. Furthermore: 

— If log e(P) is super-linear in n, the number of comparisons of our second algorithm 
is lower than that of K&K. By optimizing over e, it can be shown that the number 
of comparisons is actually loge(P) + o(loge(P)) + 0(n) in this case, a number 
of comparisons comparable to that of Fredman's algorithm. 

— If log e(P) is linear or sub-linear in n, the number of comparisons of our third al- 
gorithm is comparable to that of K&K, although the constant in front of log e(P) 
is still far from the best constant achieved by a super-polynomial algorithm 
[Brightwell et al. 1995]. 

— Our algorithms have the following useful property: they compute information 
that guides the sorting and can then be reused to solve any given instance with the 
same partial information P, in time proportional to the number of comparisons, 
plus a term linear in n. 

Outline and Key Ideas. K&K showed that graph entropy, as defined by Korner 
[1973], plays a central role in the problem of sorting under partial information. 
Letting H(P) be the entropy of the incomparability graph of P, they showed 
that loge(P) = Q(nH(P)). Every comparison performed by their algorithm de- 
creases nH(P) by at least some constant. Hence the total number of comparisons 
is 0(nH(Pj) and thus 0(loge(P)). Furthermore, their algorithm is polynomial, 
because the entropy can be computed in polynomial time using convex program- 
ming. 

Our goal is to obtain practical algorithms, without sacrificing the number of com- 
parisons. Our first key idea is to compute a greedy chain decomposition of P, that 
is, a partition of P into chains (totally ordered subsets), obtained by iteratively 
extracting a longest chain. This allows us to get rid of the costly convex program- 
ming machinery and enables us to focus only on the relevant part of P. In [Cardinal 
et al. 2010a], we have provided bounds on the amount of information (in terms of 
entropy) that is lost when we forget the relations of P between two distinct chains 
of a greedy chain decomposition. 

We directly obtain a mergesort-like algorithm: find a greedy chain decomposition 
of P, and merge the chains using a simple linear-time merging algorithm. The 
number of comparisons performed by this algorithm can be shown to be close to 
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loge(P), up to an arbitrarily small factor and a term linear in n. This is described 
in Section 5. 

As noted above, our mergesort-like algorithm performs better than that of K&K 
provided the information theoretic lower bound loge(P) is super- linear. The algo- 
rithms are comparable (in terms of number of comparisons) if loge(P) is linear. If 
loge(P) is sub-linear, we have to use another strategy: instead of forgetting all the 
relations of P between the chains of a greedy chain decomposition, we keep some of 
them. Namely, we keep all the relations between the elements of the longest chain 
and the rest of P. When log e(P) is small compared to n, the longest chain contains 
a large fraction of the elements. Hence, this less radical strategy keeps most of the 
information contained in P. 

Our second key idea is contained in the following algorithm: find a longest chain 
A, use the mergesort-like algorithm on P — A, yielding a chain B, and cautiously 
merge the chains A and B using the current partial information. Thus we reduce 
the general sorting problem to an easier subproblem known as merging under partial 
information. It is a special case of the problem of sorting under partial information 
in which P can be covered by exactly two chains, and has been studied by Linial 
[1984]. By using an algorithm for merging under partial information performing 
0(loge(P)) comparisons, we obtain an algorithm for the general sorting problem 
performing (9(loge(P)) comparisons. This is shown in Section 6. 

The problem of merging under partial information is tackled in Section 7. Al- 
though Linial [1984] has already provided an algorithm for the problem, we consid- 
erably improve on its complexity. We first show that in this special case, the entropy 
of the incomparability graph of P can be computed very easily. The computation 
relies on a structural lemma on the entropy of bipartite graphs by Korner and Mar- 
ton [1988], and on the additional structure exhibited by the incomparability graph 
of a poset covered by two chains. 

Then, we show that given the vertex weights achieving the entropy, there exists a 
sequence of pairwise chain mergings, each of which decreases nH(P) by an amount 
proportional to the number of comparisons performed. After each merging, the 
weights on the vertices can be updated efficiently. This yields the desired algorithm 
for merging under partial information, and thus an algorithm for sorting under 
partial information performing (9(loge(P)) comparisons. The global complexity of 
the algorithm is 0(n 2 5 ). 

The plan of the paper is as follows. Preliminaries on complexity measures, the 
entropy of a graph, and greedy chain decompositions, are given in Section 2. In 
Section 3, we offer new results on the entropy, improving several aspects of K&K's 
analysis. Mainly, we prove the tight inequality nH(P) < 21oge(P), whereas K&K 
show nH{P) < (1 + 71oge)loge(P) ~ 11.1 log e(P). 

As a first simple example of a near-optimal algorithm for sorting under partial 
information, we describe our (simple) Algorithm 1 in Section 4. This algorithm 
has global complexity 0(n 2 ) and performs a number of comparisons within a logn 
factor only of the information-theoretic lower bound. 

As mentioned above, the mergesort-like algorithm is given in Section 5, while 
Sections 6 and 7 are devoted to our main algorithm performing 0(loge(P)) com- 
parisons. In the last section, Section 8, we explain how that algorithm can be 
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implemented in such a way that all costly computations are done in a preprocessing 
phase. As a result, the algorithm can reuse the information computed during that 
preprocessing phase and solve any other instance with the same partial information 
P, in time proportional to the number of comparisons plus a term linear in n. 

We also include an appendix, in which we discuss the complexities of some im- 
portant steps used in our algorithms, among which is the construction of a greedy 
chain decomposition. 

Finally, we mention that the algorithm for merging under partial information 
given in the preliminary version [Cardinal ct al. 2010b] of this paper is slightly 
different from the one presented here. The two main advantages of the new al- 
gorithm is that it is simpler and it can be implemented so that the sorting phase 
takes O(q) + O(n) time, where q is the number of comparisons performed by the 
algorithm. Achieving the latter property was left as an open problem in [Cardinal 
ct al. 2010b]. 

Other Related Works. In 2004, Yao proved that the information-theoretic lower 
bound for the problem of sorting under partial information also holds for quantum 
decision trees, up to a term linear in n [Yao 2004]. His analysis also relies on the 
notion of graph entropy. 

In a recent paper, Daskalakis et al. [2009] analyze the problem of discovering 
a partial order using comparisons. In that setting, a comparison can have three 
outcomes, including one stating that the two elements are incomparable, and the 
goal is to completely identify the underlying partial order. They propose an algo- 
rithm performing a number of comparisons that is within a constant factor of the 
information-theoretic lower bound for partial orders of a given width. 

2. PRELIMINARIES 

We give a number of definitions and basic results, and summarize the contribution 
of Kahn and Kim [1995] to the problem. 

Complexity Measures. Consider an algorithm for sorting under partial informa- 
tion. The query complexity is the number of comparisons between elements of P 
that are done by the algorithm. The preprocessing complexity measures the com- 
putational work done before the first comparison is performed. The rest of the 
work is measured by the sorting complexity. The preprocessing phase and sorting 
phases are defined similarly. Thus, in the preprocessing phase, we are restricted to 
only process the input poset. The comparisons are performed during the sorting 
phase. The global complexity is simply the sum of the preprocessing and sorting 
complexities. 

Our model of computation is a RAM machine with 6(logn)-size words. The 
global complexity is measured as the total number of arithmetic and logical oper- 
ations on words. 

Entropy and Sorting. We recall that a subset S of vertices of a graph is a stable 
set (or independent set) if the vertices in S are pairwise nonadjacent. The stable set 
polytope of a graph G with vertex set V and order n is the n-dimensional polytope 

STAB(G) := conv{x 5 € R v : S stable set in G}, 
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where \ S is the characteristic vector of the subset S, assigning the value 1 to every 
vertex in S, and to the others. The entropy of G is defined as (see [Korner 1973; 
Csiszar et al. 1990]) 



Any point x £ STAB(G) describes a feasible solution of the convex program (1). 
The entropy of x is the value of the objective function of that program with respect 
to x, which we denote by H{x). 

For any given poset P, we consider two graphs: the comparability graph G{P) 
and the incomparability graph G(P). The vertex set of G(P) is the ground set 
of P and two distinct vertices v and w are adjacent in G(P) whenever they are 
comparable in P. The incomparability graph G(P) is simply the complement of 
G(P). Following K&K, we denote by H{P) the entropy of G(P) and by H(P) the 
entropy of G(P). 

Entropy plays an important role in the sorting under partial information problem. 

The first reason is explained by the following result due to K&K. In particular, 
it implies loge(P) = Q(nH(P)). Thus the information theoretic lower bound and 
the entropy of the incomparability graph of P are tightly related. 

Lemma 1 [Kahn and Kim 1995]. For any poset P of order n, loge(P) < 
nH(P) < min{loge(P) + loge • n, c\ loge(P)}, where c\ = (1 + 71oge) ~ 11.1. 

The second reason is that, while computing e(P) is #P-complctc [Brightwell and 
Winkler 1991], computing H{P) can be done in polynomial time by solving the con- 
vex minimization problem (1), as we now explain. When G = G(P), the stable set 
polytope STAB(G) has a known description in terms of linear inequalities. Al- 
though the number of inequalities is (in most cases) exponential, the corresponding 
separation problem can be solved efficiently. Hence (1) can be solved by the ellip- 
soid algorithm. (To be precise, the ellipsoid algorithm will actually approximate 
the optimum of (1) to any fixed precision, in polynomial time.) 

Much of this favorable behaviour is due to the perfection of G(P). We recall 
that a graph G is perfect if co(H) — x{H) holds for every induced subgraph H of 
G, where u>{H) and x(-ff) denote the clique and chromatic numbers of H, respec- 
tively. If G is perfect, then its complement G is also perfect [Lovasz 1972]. The 
comparability graph G(P) of P is always perfect, hence the same holds for the 
incomparability graph G(P) of P. The following basic result is a manifestation of 
convex programming duality (see for instance [Simonyi 1995] for a proof). 

Lemma 2. Assume G is a perfect graph with vertex set V and order n, and let 
x £ M. v and z £ MX be feasible solutions to (1) for G and G, respectively. Then x 
and z are optimal iff x v z v = 1/n for all v £ V . In particular, H(G) + H(G) = logn. 

Csiszar et al. [1990] have characterized perfect graphs as the graphs that "split 
graph entropy". More precisely, they proved that G is perfect if and only if, for 
every probability distribution p on the vertex set of G, the sum of the entropies of 
G and G with respect to p equals the (Shannon) entropy of p. 

The algorithm of Kahn and Kim [1995] is based on two main lemmas, Lemma 1 
above and the next lemma. Whenever a and b are incomparable elements of P, we 




(1) 
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denote by P{a < b) the poset obtained by adding the relation (a, b) to the partial 
order of P and then closing transitively. 

Lemma 3 [Kahn and Kim 1995]. In any poset P of order n that is not a chain 
there are a, b incomparable such that 



where c 2 = log(l + 17/112) ~ 0.2. 

The Algorithm of K&K and its Complexity. Let V denote the ground set of P. 
Given an optimal solution x € R v to (1) for G(P), K&K show how to choose a 
pair a, b as in Lemma 3. Knowing the primal solution x, this choice can be done 
efficiently (in 0(n 2 ) time). 

Comparing a and b gives a new partial information P' e {P(a < b), P(b < a)}. 
The key is that for any outcome, nH(P') < nH(P)—C2- This is proved by modifying 
appropriately an optimal dual solution, that is, an optimal solution z G M. v to (1) 
for G(P). By Lemma 2, z v = l/(nx v ) for all v € V. Knowing x, a new dual 
solution z' can be efficiently constructed (in 0(n 3 ) time). 

To determine the next comparison, the K&K algorithm needs to compute an 
optimal solution x' to (1) for G(P'). Because the optimality of z' is not guaranteed, 
letting x' v = l/(nz' v ) for v e V docs not work. This explains why their algorithm 
uses the ellipsoid algorithm before each comparison. 

We have shown in [Cardinal et al. 2010a] that H(P) can be expressed via a con- 
vex minimization problem with 2n variables and at most n 2 constraints, making 
possible the use of interior point algorithms for computing H(P) (this alternative 
formulation is described in Section 3). Although this makes the K&K algorithm 
more practical, this does not make it competitive with our algorithms in terms of 
running time since it is unlikely that computing H(P) using interior point algo- 
rithms can be done in less than 0(n ) time (plugging in in a straightforward way 
the number of variables and constraints in complexity bounds for interior point 
algorithms would yield a 0(n 6 ) complexity [Boyd and Vandenbcrghc 2004]). 

Greedy Chain Decompositions. Suppose we want to approximate the entropy 
H (G) of a given perfect graph G. We have shown [Cardinal et al. 2010a] that 
the following greedy heuristic performs very well. First, iteratively remove a max- 
imum stable set in G. Denote by Si, . . . , Sk the stable sets extracted from G. 
Second, construct the greedy point 



max{nH{P(a < b)),nH{P{b < a))} < nH(P) - c 2 




in STAB(G). The entropy of this point is 



k 



\Si\ 



H(x) = — V log 



E 



log 



n 



v£V 



Note that this is precisely the entropy of the probability distribution 
rJSil jSfch 
I n ' • ' • ' n )■ 
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Theorem 1 [Cardinal et AL. 2010a]. Let G be a perfect graph on n vertices 
and let x be an arbitrary greedy point in STAB(G). Then, for every e > 0, 

H{x) < (1 + e)H{G) + (1 + e) log (l + J) . 

In the context of the sorting under partial information problem, we apply the 
greedy heuristic to G(P). This gives a decomposition of P into chains Ci, . . . , Ck 
that we call a greedy chain decomposition. Although the fastest known algorithm 
for computing a maximum chain in a poset of order n has complexity 0(n 2 ) (see 
[Golumbic 2004], Chapter 5), a greedy chain decomposition can be found in 0(n 25 ) 
time, see Appendix A. 

3. A TIGHT BOUND ON THE ENTROPY OF AN INCOMPARABILITY GRAPH 

K&K conjectured that the value for the constant c\ in Lemma 1 could be improved 
to Ci = 1 + loge ~ 2.44. We show that one can actually take C\ — 2, which is best 
possible, as shown by the poset consisting of two incomparable elements. 

Theorem 2. For any poset P of order n, 

nH{P) < 2Ioge(P). 

Before proving this result, we give an equivalent definition of the entropy of a 
poset in terms of consistent collections of intervals, that is used crucially in our 
proof of Theorem 2. 

We say that a collection of open intervals {(y v - , y v +)} v ev, each of which is 
contained in the interval (0,1), is consistent with P if v <p w implies that the 
interval for v is entirely to the left of the interval for w, that is, y v + < y w - . We 
denote X(P) the set of all such collections of intervals. 

As is easily seen [Cardinal et al. 2010a], H(P) equals the minimum of 

-- V logx v 
n t-^ 

vev 

over all vectors x £ such that there exists a collection of intervals in I{P) where, 
for each v £V, the interval for v has length x v . In other words, the following lemma 
holds. 

Lemma 4. Let P be a poset of order n with ground set V. Then, we have 

H{P) = mini-- ^ \ogx v : 3{(y v - ,y v +)} veV £ X(P) s.t. Vw £ V : x v = y v +-y v -\. 
n vev 

This new definition of the entropy of a poset has some advantages. 

First, it yields a convex program with 2n variables and at most n 2 constraints for 
computing the entropy. This shows that the entropy of a poset can be computed 
with interior point algorithms. 

Second, it gives a more intuitive framework to reason about the entropy of a 
poset. As an illustration we sketch short proofs of two results by Kahn and Kim 
[1995]. 

In order to show that loge(P) < nH(P) < loge(P) + loge • n, the "easy part" 
of Lemma 1, K&K consider an optimal solution x to (1). Because x is feasible, it 



Sorting under Partial Information • 9 



defines a box that is contained in STAB(G(P)). (The defining property of this box 
is that it has x and the origin as opposite vertices.) Because x is optimal, it yields 
a simplex that contains STAB(G(P)). Thus the box is contained in STAB(G), 
which is contained in the simplex. This gives inequalities between the volumes 
of these polytopes. The volume of the box is 2~ nH ( p ) and that of the simplex 
is (n n /n\) 2~ nHi - p K By invoking a beautiful result of Stanley [1986] relating the 
volume of STAB(G(P)) to e(P), and also H(P) + H(P) = logn (see Lemma 2), 
K&K derive the desired inequalities. Stanley [1986] proves that the stable set 
polytope STAB(G(P)) and the order polytope 

O(P) := {x G M. v : x G [0, 1} V ,x v < x w whenever v w} 

have the same volume. Because O(P) canonically decomposes into e(P) simplices, 
of volume 1/n! each, one obtains that the volume of O(P), and thus STAB(G(P)), 
is precisely e(P)/n!. 

Now let {{y v - ,y v +)}vev denote any optimal collection of intervals consistent 
with P. These intervals define another box of volume 2~ nIi ( p \ this time contained 
in O(P). This directly implies nH(P) < log e(P) +log e • n, without using Stanley's 
result. We do not know such a direct proof for the inequality loge(P) < nH{P). 

Forcing any algorithm for sorting under partial information to perform 
^nH(P) > iloge(P) comparisons can be done as follows. Initially, compute an 
optimal collection of intervals {{y v - , y v +)}vev consistent with P. When faced with 
the query "is a < 6?" , answer "yes" if and only if m a < m b in the current collection 
of intervals, where m v denotes the midpoint of the interval for v G V. If the answer 
is "yes" (and a =/= b), replace the interval for a by {y a -T m a) and the interval for b 
by (m b ,y b +). If the answer is "no", replace the interval for a by {m a ,y a +) and the 
interval for b by (y b - , nib). Since H(P) + H{P) = logn, such an answer guarantees 
that each comparison decreases nH(P) by at most 2. Therefore, the number of 
comparisons performed is at least \nH(P). 

We now prove Theorem 2. 

PROOF of Theorem 2. The proof is by induction on n and, for n fixed, on the 
number of incomparabilities in P. The result being true for n = 1, we assume 
n > 2. Consider an optimal vector x G and corresponding collection of open 
intervals {(y v - , y v +)} v ev- Let a G V be such that y a + is maximum. 

If a is comparable to all elements of V, then the induction hypothesis implies 

nH(P) = (n- l)H(P^a) < 2 log e(P - a) = 2 log e(P). 

Hence, we may assume that a is incomparable to some element in V. Let b 
be such an element with y^+ maximum. Clearly, y^+ < y a + ■ In fact, it must be 
that y^+ = y a + '■ Indeed, by our choice of a and b, we have y c + < y^+ for every 
c G V — {a, b}. Thus, if y b + < y a +, then one could extend to the right the interval 
corresponding to b by an amount of y a + — y b + and still have a collection of intervals 
consistent with P. However, this new collection defines a corresponding vector 
x' G such that 

— Y] logx' v = -- V loga;„ + -(\ogx b - \ogx b ) < -- V loga;^, 
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contradicting the optimality of x. 

Exchanging a and b if necessary, we may assume that x a > Xb- By shortening 
the intervals of a and b in two different ways, we will define two collections of 
open intervals {(yl- ,yl+)} v ev and {(y 2 v _ , yl+)} ve y In the first one, we will have 
y\+ < yl- , while for the second y 2 + < y\_ will hold. To this aim, we introduce a 
few quantities. 

Let A := Xb/x a (thus, A <G [0, 1]). Let 

i • I o „ii : — 



otherwise ' | 2A otherwise 

and 

a 2 := 2/A /3 2 := 2. 

The collection {(?/*_, ?/* + )}„ e y equals {(?/„- , y t ,+ )} t , e y, except that 

l i s <i 

y a + : = y«- + — 

1 £6 
2/6- := Vb+ - it- 

Pi 

Similarly, {(y%_ , y 2 v+ )} veV equals {(y^- ,y v +)} v e v with the following two excep- 
tions: 

2 "^a 
2 , X £> 

%+ := Vb- + it- 

Pi 

Let Pi (i = 1, 2) be the interval order defined by , y l v +)} v ev, with v w 

whenever < y l _. Clearly, both Pi and Pi extend P. 
We claim that there exists an index i e {1, 2} such that 

v %l < — = (2) 
e(P) ~ v^A V ^ 

This is proved below. Assuming that the claim is correct, let x' <G be the vector 
defined by the collection of open intervals , y l v+ )}vev- This vector gives an 

upper bound on the entropy of Pi, namely 



H{Pi) < -- y~] log x' v = -- y~] logx v + - log a t + - log 

v£V v£V 

Hence, 

nH{Pi) < nH(P) + log . (3) 
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Using (2), (3), and the induction hypothesis on Pj, we obtain 

nH(P) = n log n-nH(P) 

< n log n — nH(Pi) + log (aj/3j) 
= nH(Pi) + log (cuPi) 
<21oge(P)+log { ai pi) 
e(P) 

<2Iog-^4+ log (aift) 
= 21oge(P). 

In order to prove the claim, we show the following two inequalities: 

e(P) + e(P) " 1 > 

V a lPl V&2P2 

For proving (4), it is enough to show a <p 1 b and a >p 2 b. By definition of a\ 
and p\, we have 

i = , = f y a - + x a - x b if A < | 
ya+ ya ai \ y a - + x a /2 otherwise 

and (using y a + = y b+ ) 

1 _ _ Xb_ _ _ Xb_ _ J y a - + Xa - Xb if A < \ 

V »- ~ Vb+ fh ~ Va ~ + Xa fa _ 1 y a - + xa/2 otherwise. 
Hence, a <p 1 b. Also, 



2 "^i /o 

y o - = Va+ = Va+ - Xb/2 

012 



and 



%+ = Vb~ + -W- = Va+ - Xb + -w- = Va+ - X b /2, 
P2 P2 



implying a >p 2 6. Therefore, (4) holds true. 

We proceed and show (5). The left-hand side of (5) is a function of A, which we 
denote by /(A) for short. We have 



/(A) 



vT^A + ^ if A < \ 
2 * 

otherwise. 



2y/\ 2 

(Note that y/l - A + ^ = ^ + ^f forA= i) The first derivative of /(A) is 



/'(A) 



i if a< § 

4^ 2yT=A " 2 

11 , 

otherwise. 



I 4y/\ 4A3/ 2 



12 • Cardinal, Fiorini, Joret, Jungers, Munro 



As the reader will easily check, /' is positive over the open interval (0, |) and 
negative over Since /(0) = /(l) = 1, we deduce that /(A) > 1 for every 

A e [0, 1], as claimed. This concludes the proof. □ 

Finally, we sketch a simple proof of the weaker inequality nH(P) < 41oge(P). 
We follow the same proof structure as above. Instead of picking a € V such that 
y a + is maximum, pick a £ V such that x a is maximum. Then pick b e V — {a} such 
that the interval for b contains the midpoint of the interval for a. If e(P(a < b)) < 
e(P)/2, then define x' by replacing the interval for a by its first quarter and the 
interval for b by its last quarter. Otherwise, we have e(P(b < a)) < e(P)/2. In this 
case, define x' by replacing the interval for a by its last quarter and the interval for 
b by its first quarter. 

4. INSERTION SORT 

We first propose an 0(n 2 ) sorting algorithm with query complexity (9(logn • 
loge(P)). It consists of first finding a maximum chain C C P, then iteratively 
inserting the remaining elements of P — C in the chain C, using binary search, see 
Algorithm 1. In order to show that its query complexity is 0(logn • loge(P)), we 
need two lemmas. 



Algorithm 1 "Insertion sort" -like algorithm for sorting under partial information 
{Phase 1 (preprocessing)} 
find a maximum chain C C P 
{Phase 2 (sorting)} 
while P - C ^ do 

remove an element of P — C and insert it in C with a binary search 
end while 
return C 



LEMMA 5. Let P be a poset of order n and let C be a maximum chain in P. 
Then \C\ > 2- H(p ^n. 

Proof. It is well known ([Korner 1986; Cardinal et al. 2008]) that the entropy 
of a graph on n vertices with stability number a is at least — log ^ . The result 
follows by applying this to G(P). □ 

Lemma 6. For all x e R, 1 - 2^ < In 2 • x. 

Now the number of comparisons performed by the algorithm is at most 

logn-(n— |C|) < logn • (n — 2^ H ^n) (Lemma 5) 
< logn • In 2 • nH(P) (Lemma 6) 
= 0(logn • loge(P)) (Lemma 1). 

This algorithm has the property that we can perform the preprocessing step only 
once, and sort all instances with the same partial information in time 0(logn • 
loge(P)) + 0(n). To achieve this, we store the maximum chain C in a balanced 
binary search tree in time 0(n) and insert each remaining element in time O(logn). 
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Fig. 2. Illustration of Algorithm 2. 

5. MERGE SORT 

In order to improve on the previous algorithm, we use an approach similar to merge 
sort, see Algorithm 2. This algorithm is illustrated in Figure 2. 

Algorithm 2 "Merge sort" -like algorithm for sorting under partial information 
{Phase 1 (preprocessing)} 

find a greedy chain decomposition C\, . . . , Ck of P 
C^{d,...,C fc } 
{Phase 2 (sorting)} 
while \C\ > 1 do 

pick the two smallest chains C and C in C 

merge C and C into a chain C" , in linear time 

remove C and C from C, and replace them by C" 
end while 

return the unique chain in C 



Let g denote the entropy of the probability distribution { , . . . , } , the 
distribution of the sizes of the chains in the greedy chain decomposition. Our next 
lemma bounds the query complexity of Algorithm 2 in terms of g. 

Lemma 7. The query complexity of Algorithm 2 is at most (g + l)n. 

Proof. Phase 2 of Algorithm 2 is a multiway merge of the chains C\ extracted 
from P. The two smallest chains are iteratively merged, thereby forming a Huffman 
tree: this is a known strategy for merging sorted sequences of different lengths (see 
for instance [Frazer and Bennett 1972]). Huffman codes have average codeword 
length within one bit of the entropy [Cover and Thomas 2006] . Hence the average 
root-to- leaf distance in the tree with respect to the distribution { , . . . , } is 
at most g + 1 . 

Merging two chains is done in linear time by iteratively choosing the minimum. 
Consider an element of the chain d . In the worst case, this element is compared 
at every node of the path from the leaf node corresponding to C,-, to the root of 
the tree. Every time we compare two elements at one node of the tree, we charge 
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the comparison to the element that is selected. We denote by U the length of the 
path to the ith chain Cj. Summing over all the elements, we get: 

Y J t i \C i \=nY j t i ^<{g + l)n, 

proving the lemma. □ 

The following theorem uses this bound and Theorem 1. 

Theorem 3. For any e > 0, the query complexity of Algorithm 2 is at most 
(1 + e) log e(P) + (1 + e) ( log e + log (l + -) ) n + n = (1 + e) log e(P) + O e (n). 
Proof. From Lemma 7, we infer that the query complexity is at most 
(g + 1) n < (1 + s)nH(P) + (1 + e) log (l + ^n + n (Theorem 1) 

< (1 + e) (log e(P) + log e • n) + (1 + e) log (l + * ) n + n (Lemma 1). 

□ 

We conclude that Algorithm 2 is an 0(n 2 5 ) algorithm with query complexity at 
most (1 + e) loge(P) + O e (n), for any e > 0. Note that again, we can reuse the 
chain decomposition obtained in the preprocessing phase for sorting any instance 
with the same partial information in time proportional to the query complexity. 

6. CAUTIOUS MERGE SORT 

The query complexity of Algorithm 2 is not 0(loge(P)) because it completely 
ignores a large part of the partial information. Now, we show that using the partial 
information for the last merge suffices to obtain an algorithm with query complexity 
0(loge(P)). 

The subproblem at hand is that of merging under partial information. It is a 
special case of sorting under partial information, in which the given poset P is 
covered by two chains A and B, that is, P is of width at most 2. (The width of a 
poset P is the maximum size of an antichain of P.) 

That problem was studied by Linial [1984] , who proposed an algorithm with query 
complexity 0(loge(P)). However, this algorithm requires computing polynomially 
many 0(n) x 6(n) determinants. In Section 7, we obtain an 0(n 2 log 2 n) algorithm 
for the problem with query complexity at most 61oge(P). 

Theorem 4. Suppose there exists an algorithm for the problem of merging un- 
der partial information with query complexity at most c 3 loge(P'), given as partial 
information a poset P' of order n and width at most 2. Then there exists an algo- 
rithm for the problem of sorting under partial information with query complexity at 
most (9.09 + c 3 ) log e(P). 

Proof. Let Algorithm 6 be the hypothesized algorithm for merging under partial 
information. (Such an algorithm will be given in Section 7.) Consider the following 
algorithm, illustrated in Figure 3. 
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Algorithm 3 Improved "merge sort" -like algorithm for sorting under partial in- 

formation 

l: find a maximum chain ACP 

2: apply Algorithm 2 to the poset P — A, yielding a chain B 
3: apply Algorithm 6 to the current partial information P' 
4: return the resulting chain 




Fig. 3. Illustration of Algorithm 3. 

From Lemma 5, we have |A| > 2~ H ( p * > n, and therefore (using Lemma 6): 

\B\ = \P-A\< n(l - 2- H ^) < In 2 ■ nH(P), (6) 

Now from Theorem 3 the number of comparisons in lines 2 and 3 of Algorithm 3 is 
at most 

(1 + e) log e(P - A) + ((1 + e) ( log e + log(l + \)) + l) \P - A\ + c 3 log e(P') 

< (l + e)loge(P) + ((l + e)(l + ln(l+|)) + In 2~) nlf(P) + c 3 log e(P) (from (6)) 

< (l + e + 2((l + e)(l + ln(l + i))+ln2) +c 3 )loge(P) (Theorem 2) 

< (9.09 + c 3 ) log e(P) (taking e = 0.35). 
□ 

Assuming that the global complexity of Algorithm 6 is 0(T(n)), the results of 
the previous section imply that the global complexity of Algorithm 3 is 0(n 2 5 ) + 
0(T(n)). 

7. MERGING UNDER PARTIAL INFORMATION 

In this section, we assume that P is covered by two disjoint chains, denoted by 
A and B. First, we describe a structural result by Korner and Marton [1988] 
concerning the entropy of a bipartite graph. Second, we show how to use this to 
obtain an 0{n 2 log 2 n) algorithm with query complexity at most C3ioge(P), with 
c 3 = 6. 

Before proceeding, we state a lemma providing several properties of the incom- 
parability graph of P that we repeatedly use subsequently. The proof is straight- 
forward, thus omitted. 
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Lemma 8. Let P be a poset covered by two disjoint chains A and B, and let 
G = G(P). Then: 

(i) The graph G is bipartite, with bipartition A, B; 

(ii) The neighborhood N(u) of any vertex u in G is an interval in the opposite 
chain (thus, G is biconvex,); 

(Hi) Consider two vertices u and v in the same chain, say A, and such that u ^p v. 
Let [c u ,d u ] and [c v ,d v ] denote the intervals of B defined by N(u) and N(v), 
respectively. Then, we have c u ^p c v and d u ^p d v . Ln particular, N(w) is 
contained in the interval [c u ,d v ] of B with endpoints c u and d v , whenever w 
belongs to A and u <p w <p v. 

7.1 The Entropy of Bipartite Graphs 

As noted above in Lemma 8.(i), the incomparability graph G(P) of P is a bipartite 
graph. Korner and Marton [1988] describe a method for computing the entropy of 
any bipartite graph (see Theorem 3.8 in Simonyi's survey on graph entropy [Simonyi 
1995]). Below, h denotes the binary entropy function. Thus, we have = 
-£ log£ - (1 - log(l - for £ G (0, 1), and h(0) = h(l) = 0. 

Theorem 5 [Korner and Marton 1988]. Let G be a bipartite graph of order 
n, with bipartition A, B. Then, one can find partitions A = A\ U • • • U Ak and 
B = B x U • • • U B k such that 

k 



{-[ n \\Ai\ + \Bi\) 



The partitions are constructed iteratively. Let Nc(X) denote the neighborhood 
of a set X of vertices in the graph G. For i E {1, . . . , k}, Korner and Marton define 
Ai as any subset of A' := A — A\ — ■ ■ ■ — Ai_\ that maximizes 

(T) 



|W G .(A,)| 

in the graph G' obtained from G by removing all vertices contained in some Aj or 
some Bj with j < i, and define Bi as Ng>{A{). By convention, if there is a vertex 
u in A' that is isolated, then we let Ai = {u} and Bi = 0. If A' is empty and 

B' := B — Bi — ■ Bi_\ is not, we pick a vertex w in B', let A^ = and Bi = {v}. 

Given partitions as in Theorem 5, the point x E STAB(G) achieving the minimum 
in (1) is obtained by simply letting 

x u :— J^'| whenever u E Ai, and x v := J^'j whenever v E Bi 
\Ai\ + \Bi\ \Ai\ + \Bi\ 

7.2 Local Optimality and rebalancing 

Let G = G(P), and let E denote the edge set of G. Because G is bipartite, 

STAB(G) = {x ER V :x u + x v < 1 Vuv E E, < x v < 1 Vw E V}. 

Consider a point x in STAB(G). The point x is a feasible solution of the convex 
program (1). If x v = for some vertex v E V, then the objective function value 
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is infinite, and the point x is useless. In order to prevent this, we mostly consider 
points in STAB*(G) := STAB(G) n {x G R v : x v > Vv G V}. 

An edge uv G E is said to be tight with respect to the solution x if x u + x v = 1. 
Let G(x) denote the graph whose vertices are those of G and whose edges are the 
edges of G that are tight. 

We begin with a lemma that governs much of the structure of G(x), about edges 
that 'cross'. We say that two edges uv and u'v' of G, with u, u' G A and v, v' G B, 
cross if u <p u' and v' <p v, or u' <p u and u <p t/. 

Lemma 9. Lei P be a poset covered by two disjoint chains A, B, and let G = 
G{P). Consider a point x G STAB(G) and two edges uv, u'v' G E(G) that are tight 
with respect to x, with u,u' £ A and v,v' G B. If uv and u'v' cross, then both uv' 
and u'v are edges of G, and both are tight with respect to x. 

PROOF. By Lemma 8(iii), uv' and u'v belong to E(G). Assume, by contradiction, 
that uv' is not tight. Then 

x v = 1 — x u (because uv is tight) 

> x v i (because uv' is not tight) 
= 1 — x u i (because u'v' is tight) 

> x v (because u'v is an edge and x is feasible), 

a contradiction. We conclude that both uv' and u'v are tight. □ 

The point x G STAB(G) is called locally optimal if, for every (connected) com- 
ponent K of G(x): 

x u = —jj^ for au u £ A(~)K, and x v = —jj^ for au v € BC\K. (8) 

We say that the component K is balanced if the local optimality condition (8) 
holds. Otherwise K is unbalanced. 

Consider a point x G STAB*(G). A component of G(x) is trivial if it consists of 
a unique vertex, non-trivial otherwise. A trivial component can be either balanced 
or unbalanced, in which case it is said to be loose. Observe that a trivial component 
{v} is balanced if and only if v is a cut-point of P, that is, v is comparable to every 
other vertex of P. (Here we use the assumption x u > for the vertices u G Nq(v).) 

The first part of the next lemma states that a component K of G(x) typically 
determines two (possibly trivial, or even empty) intervals, one in the chain A and the 
other in the chain B. The exceptions are characterized by the following definition: 
we say that a component L of G(x) is an inlay of another component K if there 
exists a vertex w G L and vertices u, u" G K in the same chain as w (that is, 
u, u" G A iff w G A) such that u <p w <p u". 

The second part of the lemma implies that P induces a linear order on the non- 
trivial components of G(x). This linear order naturally extends to all components 
of G(x), provided that no such component is loose. 

Below, when S and T are two disjoint subsets of the poset P, we write S ^p T 
whenever u ^p v holds for every u G S and every v G T. 

Lemma 10. Let P be a poset covered by two disjoint chains A, B, and let G = 
G{P). Consider a point x in STAB*(G). Then 
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(i) if a component L of G(x) is an inlay of a component K of G(x), then L is 
loose; 

(ii) if K, L are distinct non-trivial components of G(x), then either K <p L or 
L^ P K. 

Proof, (i) Suppose otherwise. Let w e L and u, u" £ K be as above. Without 
loss of generality, we may assume that all three vertices belong to A and v! ^ K 
whenever u' € A and u <p v! <p u" . By Lemma 8(iii), because K is a component 
of G{x) containing u and u" , there is a vertex v of K n B adjacent to both u and 
u" in G(x). 

By Lemma 8 (ii) , the neighborhood of v in G is an interval in A containing u and 
u". Thus, it also contains w. Because w does not belong to A, the edge vw is not 
tight with respect to x. 

First, suppose that L is non-trivial. Thus there exists w' G B,w' ^ v such 
that ww' € E(G) and the edge ww' is tight with respect to x. However, ww' 
crosses either uv or u"v, implying in both cases that vw is tight by Lemma 9, a 
contradiction. Hence, L is trivial and L — {w}. 

Second, suppose that L is balanced. Because x <E STAB*(G), we have x w + x v — 
l + x v > 1, a contradiction. Hence, L is loose. 

(ii) On the contrary, suppose that neither K L nor L K holds. From what 
precedes, neither K nor L is an inlay. Consequently, we may assume KtlA ^p in A 
and at the same time Ifl B ^p K (~l B, without loss of generality. Let uv and u'v' 
be edges of the components K and L, respectively, with u,v! € A and v,v' € B. 
By our assumption, these two edges cross. Hence, by Lemma 9 the edge uv' is tight 
with respect to x. This implies that u and v' are in the same component of G(x), 
a contradiction. □ 

Our algorithm for merging under partial information will take in input a locally 
optimal point x G STAB(G). It will repeatedly modify G and x, and then "rebal- 
ance" x so that it becomes locally optimal again. 

The rebalancing algorithm is described in Algorithm 4. Its input is a point 
x e STAB*(G). Given a component K of G(x), the slack of K is defined as the 
real o minimizing 

\Ar\K\ 

x v + cr 

under the constraint that x' <G STAB*(G), where v is any vertex in A n K and x' 
is obtained from x by adding a to x w for all w € A n K, and substracting a to x w 
for all w e B fl if. In other words, a represents the maximum quantity by which 
we can "rebalance" K without loosing feasibility. (Note that the slack could be 
negative.) 



Algorithm 4 rebalancing Algorithm 
l: while there exists an unbalanced component K in G(x) do 
2: compute the slack a of K 

3: put x v := x v + a for all v e A n K, and x v := x v — a for all v € -B n AT 
4: end while 
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Here are a few properties of Algorithm 4 which are easy to check. Consider an 
iteration of the while-loop, and let x' be the modified point x at the end of the 
iteration. 

When the component K is "rebalanced" , either it becomes a balanced component 
of the graph G(x'), or there is at least one edge in G between K and another 
component of G{x) that became tight with respect to x' , and hence K is "merged" 
with other components of G{x) into a single component of G(x'). 

In order to capture when this happens, we say that K touches another component 
L of G(x) if K and L are linked by an edge of G (that is not tight with respect 
to x) and there exists no non-trivial component M distinct from K and L such 
that some edge yz with both endpoints in M is ranked between K and L, that is, 
KC\A <p {y} <p L(lA and KC\B ^ P {z} sC P L n B, or L n A < P {y} sC P K n A 
and LC\ B ^ P {z} ^ P K C\ B (we assume i/ei and z € B). 

It follows from Lemma 10(ii) that K touches at most two non-trivial components 
of G(x). 

Lemma 11. Let x e STAB*(G) and K be as above. If K merges with a compo- 
nent L of G(x) then it touches L. 

Proof. Consider any edge vw that caused K and L to merge, with v £ K and 
w E L. UK and L do not touch, there exists a component M and an edge yz 
as above. By Lemma 9 applied to G and x', where x' G STAB*(G) is defined as 
precedingly, we conclude that M and L are contained in the same component of 
G(x'). Because x' u = x u for all vertices outside K, this implies that M and L are 
contained in the same component of G(x), a contradiction. □ 

Considering a point x € STAB*(G), we color the components of G(x) as follows: 
a component is colored red if it has at least as many vertices in A than in B, blue 
otherwise. The point y is said to be color consistent if for every component K of 
G(x), and vertices u € A n K and v e B n -ft', we have x u > 1/2 and x„ < 1/2 if K 
is red, and x u < 1/2 and ir„ > 1/2 if K is blue. Observe that being color consistent 
is a relaxation of being locally optimal. 

Lemma 12. Let x,x' G STAB*(G) and K be as above. If x is color consistent, 
then x' is also color consistent. Moreover, K cannot merge with components that 
have colors different from that of K, and the component ofG(x') containing K has 
the same color as K. 

Proof. First observe that, because x is color consistent, the weight modification 
is such that, for v € A n K, we have x' v > 1/2 iff x v > 1/2, and similarly, for 
v e B n K, we have x' v > 1/2 iff x v > 1/2. Thus x' is also color consistent if K 
does not merge with other components. Now assume that K does merge with other 
components. Consider an edge vw of G between K and another component L of 
G(x) that became tight with respect to x', with v e K and w e L. Since x' w = x w 
and x v + x w < x' v + x' w = 1, we have x^, > x„. There are four cases to consider: 

v e A and X is red in G(x). Then x' v > x v > 1/2 and x K = x^ < 1/2. 
— v £ A and is blue in G(x). Then x„ < 1/2, hence x' v < 1/2 and x w = x' w > 
1/2. 

ueB and is red in G(x). Then x v < 1/2, hence xj, < 1/2 and > 1/2. 
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— v € B and K is blue in G(x). Then x' v > x v > 1/2 and x' w — x w < 1/2. 

In each case we conclude that L had the same color as K in G(x), as claimed. 
Considering all components L that merge with K, a direct consequence of what 
precedes is that the component of G(x') containing K has also the same color as 
K. Furthermore, x' is color consistent. □ 

Finally, observe that the entropy of x' is at most that of x. This is because the 
function 



\AHK\. , \BC\K\ M 



£ ^ '"'^r' lo §C + 1 pftP lo s(i - 

is strictly convex over the interval (0, 1) with a minimum in £ = \A n K\/\K \. 



7.3 The Core of the Algorithm 

At the heart of our algorithm for merging under partial information is the following 
procedure. Given a locally optimal point x G STAB(G), carefully pick a non- 
trivial component of G(x) and merge the two corresponding chains. This causes 
all the edges between the two chains to disappear from G. This also creates loose 
components. Then, make x color consistent by increasing certain coordinates of 
x to 1/2 or 1/2 + e. Finally, make x locally optimal again by rebalancing it. At 
all times, the point x remains feasible, that is, x G STAB(G). This procedure is 
repeated as long as it is necessary. In the process, some vertices become cut-points, 
which reveals their respective ranks, and copied in an output chain. 

This time, for merging a pair of chains, we use the Hwang-Lin algorithm [1972]. 
This is a simple near optimal algorithm for merging two disjoint chains X and Y of 
different lengths. It proceeds by splitting the longest chain, say X, into blocks of 
size 2L log (l x l/l y l)J . Then every vertex in the smallest chain Y is inserted into X, by 
first performing a linear search among the blocks, then a bisection within a block. 
The vertices in Y are inserted in order, so that once a block of X is discarded, it is 
never looked at again. 

In the analysis of Algorithm 6, we will use the following bound on the number of 
comparisons performed by the Hwang-Lin algorithm. 

Lemma 13. Provided \X\ > \Y\, the number of comparisons performed by 
the Hwang-Lin algorithm for merging two disjoint chains X and Y is at most 

|r|io g (4|x|/|r|). 

Proof. It is known [Hwang and Lin 1972] that the number of comparisons 
performed by the Hwang-Lin merging algorithm is at most 



Let £ G [0, 1) be such that 



log 



\Y\ 



log 



\Y\ 



\X\ 

2 Uog^j 



1. 



Then the number of comparisons is at most 

\X\\ A\X\ 



|y|(i-e+2 £ +iogLj) < iki log - 



\Y\ ' 
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where the last inequality follows from 1 — £ + 2^ < 2 for £ G [0, 1). The result 
follows. □ 

Consider a locally optimal point x G STAB(G), and a non-trivial component if 
of G(x). By Lemma 10(i), if consists of two disjoint chains, namely Ad K and 
B H K, which form intervals in the chains A and B, respectively. 

We define the small chain of if to be A n if if if is blue and B (~l if if if is red. 
The big chain of if is the other one. Thus the small chain of if is the one that has 
minimum cardinality, except that in case the two chains AOK and BC\K have the 
same cardinality then if is red by definition and the small chain is B (~l if . Because 
x is color consistent, all vertices v in the small chain of if have x v < 1/2 (and even 
x v < 1/2 when if is blue). This is why vertices of the small chain of if are called 
small vertices. Similarly, the vertices in the big chain of if are called big vertices. 

The component if is said to be good if all the edges of G having one endpoint 
in its small chain have their other endpoint either in the other chain of if or in a 
component whose color is distinct from that of if. 

Lemma 14. Suppose x G STAB(G) is locally optimal. If G(x) has at least one 
non-trivial red component, then one of them is good. The same is true for non- 
trivial blue components. 

Proof. Suppose G(x) has at least one non-trivial red component. Let if be 
such a component which minimizes \A (~l if |/|if |. We will show that if is a good 
component. Consider a vertex v G B n if and suppose to is a neighbor of v in G 
which is outside if. Since vw is not tight, < 1. In particular, x w < 1, and 

hence w belongs to another non-trivial component L of G(x). If L is red, by our 
choice of if we have \AnL\/\L\ > \AnK\/\K\. Thus 

_ \BHK\ \A n L\ \B n if | \AHK\ 
X - +Xw - \ K \ + \ L \ * \ K \ + \ K \ - 1 ' 

implying that vw is tight, a contradiction. Hence L must be blue, as claimed. 
The case of blue components is handled similarly. □ 

We are now ready to formally state the algorithm. For the sake of simplicity, the 
algorithm makes four assumptions. First, the given point x G STAB(G) is locally 
optimal. Second, the contribution of the red components to the entropy of x does 
not exceed that of the blue components. (The second assumption can be made 
without loss of generality: if this is not the case, simply exchange the chains A and 
B.) Third, the constant e on line 5 of the algorithm is set to e := Last, all 
the cut-points that P initially has have already been copied to the output chain at 
their respective final positions. 

When the chains X and Y are merged (see line 3 of Algorithm 5), all the edges 
of G between vertices of if disappear (in other words, the vertices in if become 
comparable elements of P). As a result, all vertices of if become loose. We will 
prove that increasing the corresponding coordinates of x to at least 1/2 or 1/2 + e 
(see the for-loop in lines 4-6 of Algorithm 5) makes x color consistent, so that 
Algorithm 4 can be applied. 

Before proving this last fact, we now study in more detail the evolution of the 
graph during an iteration of the algorithm. Again, we use different symbols to 
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Algorithm 5 Core of the Algorithm for Merging under Partial Information 
l: while G(x) has a non-trivial component do 

2: pick a good component K, giving higher priority to red components 

3: merge the chains X := Af)K and Y := B <1 K with the Hwang-Lin algorithm 

4: for v e K do 

5: put x v :— max{i„, 1/2} if v £ A, and x v := msot{x v , 1/2 + e} if v € -B 
6: end for 

7: rebalance a; using Algorithm 4 
8: for u e if do 
9: if x v = 1 then 

10: copy v at its final position in the chain C 

11: end if 
12: end for 
13: end while 
14: return C 



denote the current objects (graph, point) at different moments of the algorithm: 
G and x respectively denote the graph and point at beginning of an iteration of 
the main loop (line 2), G' denotes the graph after the merging (lines 4-12), and x' 
denotes the point just after the first for- loop (line 7). 

As the reader can verify, G' is a spanning subgraph of G containing none of the 
edges of G with both endpoints in K and some of the edges of G with exactly one 
endpoint in K. The other edges of G, that is, the edges of G with both endpoints 
outside K, survive in G' if and only if both of their endpoints are ranked below 
K, or both are ranked above K . In particular, all the edges linking two vertices of 
the same component of G(x) survive in G'{x), provided this component is distinct 
from K. 

Lemma 15. Using the notations above, x' is a color consistent point of 
STAB(G'). 

Proof. Observe that {v} is a loose component of G'(x) for every v € K. We 
will show that this remains true in the graph G'(x'). 

Let v e K. First suppose K is red in G(x). If v £ A, then x v > 1/2, thus 
z' v = x v , and hence {v} remains loose in G'(x'). If v € B, then x v < 1/2, implying 
x' v = 1/2 + e. However, since K was a good component, every neighbor w of v in 
G' belonged to a blue component of G(x), and thus x w < 1/2. Since x is locally 
optimal, this implies x w < 1/2 — 1/n = 1/2 — 2e. It follows 

<+x' w = (1/2 + e)+x w < (1/2 + e) + (1/2 - 2e) < 1, 

implying that vw is not tight w.r.t. x'. Therefore, {v} is loose in G'(x'). 

Now assume K is blue in G(x). If v £ B, then x v > 1/2, and thus x v > 
1/2 + 1/n = x v > 1/2 + 2e (because x is locally optimal). Hence, x' v — x v , and the 
component {v} is loose in G'(x'). If v € A, then x v < 1/2 and thus x' v = 1/2. Since 
the algorithm gives priority to good components that are red, Lemma 14 implies 
that all red components in G(x) are trivial. Since x is locally optimal, none of them 
is loose. Since x v > and since K was a good component, it follows that v is only 
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adjacent to vertices in B n K in G. Therefore, v is an isolated vertex of G' , and 
the component {v} is loose in G'(x'). 

It follows that the components of G'(x) are exactly those of G(x) that are distinct 
from K plus the loose components {v} for all v £ K. Since x' v > 1/2 if v £ A (~l K 
and > 1/2 if u e B (1 K, and because x^, = x w for every w <E V(G) — K, we 
deduce that x' is color consistent. □ 

Observe that after the first for-loop (lines 4-6), all small vertices of K become 
big, and all big vertices stay big. By Lemmas 12 and 15, the rebalancing step (line 
7) preserves the status of all the vertices, that is, small vertices stay small and big 
vertices stay big. 

Lemma 16. Let P be a poset of order n covered by two disjoint chains A, B, 
and let G = G(P). Assume that x G STAB(G) is a locally optimal point such that 
the contribution of the red components to H(x) is not larger than that of the blue 
components. Then, Algorithm 5 merges A and B in at most 3nH(x) comparisons. 

Proof. Let k be the total number of iterations of the while- loop. Consider the 
jth iteration. Let Gj denote the graph G at the beginning of that iteration. In 
this proof, we deviate from the notations used above, and denote by x' , x" , and 
x'" the feasible point under consideration at the beginning, at the end of the first 
for-loop, and at the end of the while-loop, respectively. (We keep the notation x 
for the original point given in input.) Let K be the good component chosen at that 
iteration. Let Sj be the number of small vertices in K. (Thus, Sj = \K n A\ if 
\K fl A\ < \Kr\B\, and Sj = \K C\ B\ otherwise.) Let similarly tj be the number of 
big vertices in K . 

Let rj be the number of small red vertices in Gj(x'), and let <f)j :— nH(x') + rj. 
Let also r^+i := and 4>k+i '■= 0. 

From Lemma 13, we know that when the two chains KC\A and Kr\B are merged, 
the Hwang-Lin algorithm spends at most Sj log — comparisons. 

First suppose K is red in Gj(x'). During the first for-loop (lines 4-6), the algo- 
rithm increases x' v to at least 1/2 for every small vertex v in K. Thus, while such 
a vertex contributed —- log to the entropy of x' , its contribution to that of 

x" is at most — - log 1/2 = -. It follows 

H(x') - H{x") > log - ^ = H log - 

n Sj +tj n n Sj n 

Since the rebalancing algorithm does not increase the entropy, we have H(x"') < 
H(x"), and hence 

H(x') - H{x'") > ^ log (9) 
n Sj n 

Let us look at the difference rj — rj + \. Every small vertex in K becomes big 
at the end of the for-loop, and all other vertices keep their status. Also, as we 
have already seen, the status of the vertices do not change during the rebalancing 
algorithm. Thus 

rj -r j+1 > sj. (10) 
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Now assume K is blue in Gj(x'). Since the algorithm gives priority to good 
components that are red, and there is at least one such component if there is a 
red component, it follows that every component of Gj(x') is blue. Then it can be 
checked that x' v is increased to 1 for every small vertex v in K by the algorithm. 
(In fact, this is also true for every big vertex in K, though we will not use that 
fact.) We have 

H(x') - H(x") > log — = ^ log s i± t l. 

n Sj + tj n Sj 

Again, we have H(x"') < H(x"), which implies 

H(x')-H(x"')>^log^^. (11) 
n sj 

Here, there are no red vertices anymore. Thus 

rj - r j+1 = 0. (12) 

It follows from (9) and (10) (in the red case) and from (11) and (12) (in the blue 
case) that 

>s>g^±^. (13) 
The right-hand side of (13) can be bounded as follows: 

sj log Sj +tj > \ Sj log ^ . (14) 

Sj l Sj 

This inequality follows from the fact that ( Sj+t3 \ = ^ 3 ~ 2 * 3 - 1 + ^ip- > 

\ s i J s j s j s j 

Now, let q be the total number of comparisons done by the algorithm. Using 
(13), (14), and Lemma 13, we obtain 

- - ^ E s ^ lo s Sj -j7 1 ^ E ^ l ° s v~ - f • 

j=i j=i ^ j=i ^ 

Thus 

q<2(f) 1 = 2nH(x) + 2r x . 

The number n of small red vertices in G\{x) is equal to nH(x), where x is the 
point defined by letting x v = 1/2 for every small red vertex w in Gi(x), and letting 
x v = 1 for every other vertex. The entropy of ir is at most the contribution of the 
red components in G\{x) to the entropy of x. The latter contribution is in turn, 
by our assumption, at most H(x)/2. Therefore, 

q < 2nH(x) + 2r Y = 2nH(x) + 2nH(x) < 3nH(x), 

as claimed. □ 

7.4 Putting Pieces Together: the Final Algorithm 

Our algorithm for the problem of merging under partial information is given below, 
see Algorithm 6. By combining Theorem 2 and Lemma 16, we obtain the following 
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result. In the next section, we prove that the algorithm can be implemented so 
that its global complexity is 0(n 2 log 2 n). 

Theorem 6. Let P be a poset covered by two disjoint chains A, B, and let 
G = G(P). Then Algorithm 6 merges A and B in at most 61oge(P) comparisons. 



Algorithm 6 Algorithm for Merging under Partial Information 

l: compute a point x G STAB(G) with H(x) minimum 

2: if the contribution of the red components to H(x) exceeds that of the blue 

components then 

3: exchange the chains A and B 

4: end if 

5: for v G A U B do 

6: if v is a cut-point then 

7: copy v at its final position in the chain C 

8: end if 

9: end for 

10: call Algorithm 5 

11: return C 



7.5 Complexity 

In this section, we sketch an efficient implementation of the main steps of Algorithm 
6, namely computing the entropy of a poset of width at most 2 (line 1 of Algorithm 
6), merging a pair of disjoint chains (line 3 of Algorithm 5, called by Algorithm 6), 
and updating after a merging (lines 4-12 of Algorithm 5, called by Algorithm 6). 

There are some differences between the way the algorithms are described above, 
and the way they are implemented here: for the sake of efficiency, we sometimes 
change the order of some steps or use ways to accelerate some others. 

We start by briefly discussing the data structures used. 

Data Structures. The two chains A = {ai, . . . , a\A\} an d B = {b\, . . . , &|s|} are 
kept in separate vectors which are never modified during the course of the algorithm 
(throughout, we assume aj ^p a.j+i for i = 1, . . . , \A\ — 1 and bj ^p fej+i for 
j = 1, . . . , \B\ — 1). The output chain C is a vector of size n, initialized arbitrarily. 
As soon as the 'true' rank (that is, the rank in the linear order $;) of a vertex is 
known, it is copied to the corresponding entry of C. 

The data structure for the incomparability graph G has two parts: a static part 
and a dynamic part. The dynamic part also contains information that allows us to 
monitor the evolution of the point x G STAB(G), and in particular the components 
ofG(a;). 

The static part records the initial neighborhood of each vertex of G, as it is at 
the beginning of the algorithm. Because each of these neighborhoods is an interval 
of either A or B, it suffices to record the indices of the first and last vertex within 
each interval. For instance, consider a vertex v G B, and let [a i; aj] := {ai, . . . , aj} 
denote its neighborhood in G. Then we record the pair We allow j = i — 1 
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when v is a cut-point. (In this case, the rank of v in the linear order $J is precisely 
its rank in the chain B, plus i — 

The dynamic part consists of the list of non-trivial components of G and, for 
each such component of G, the list of non-trivial components of G(x) contained in 
the corresponding component of G. The order of the components in each list is 
kept consistent with P. (Recall that x € STAB*(G) during the whole algorithm, 
and hence P induces a linear ordering on the non-trivial components of G{x) by 
Lemma 10.) Trivial components (balanced or not) are not explicitly stored. 

Extra information is stored in the nodes of these lists. Consider a component Z 
of G. Then, by Lemma 8, both AdZ and BC\Z are intervals. We store the indices 
of the first and last vertices of each of these intervals, in the node for Z. This 
is used to implicitly maintain the neighborhood of each vertex of G: the current 
neighborhood of a vertex is the intersection of its initial neighborhood and of the 
component of G that contains it. 

In the node corresponding to a non-trivial component K of G{x), we store the 
indices of the first and last vertices of the intervals of A n K and BC\K. We also 
store the value of x u for some u e An K and of x v for some v € B n K (because 
K is a component of G(x), the point x is constant on both A n K and B n K). 

On the side, we maintain the list of unbalanced components of G(x) (here, the 
order of the components in the list is arbitrary) . Extra information is placed in the 
lists of components of G(x) so that locating a given unbalanced component takes 
constant time. Similarly, we maintain the list of good components of G(x) (the red 
components are systematically placed before the blue ones). 

Computing the Entropy. In Appendix B, we prove the next result which implies 
that line 1 of Algorithm 6 can be performed in 0(n 2 log 2 n) time. This is due to 
the fact that, in virtue of Lemma 8, the incomparability graph of a poset of width 
at most 2 is bipartite and biconvex, thus in particular bipartite and convex. 

Lemma 17. The entropy of an n-vertex convex bipartite graph G can be computed 
in time 0(n 2 log 2 n). 

Rebalancing. Assume for now that the current point x in Algorithm 4 is such 
that G(x) has no loose component. Then, processing an unbalanced component K 
of G(x) (see lines 2-3 of Algorithm 4) can be done in constant time: it suffices to 
check the components of G(x) that touch K, and perform the necessary updates. 
Letting Z denote the component of G that contains K, these components are the 
neighbors of K in the list of components of G(x) contained in Z. Thus there are 
at most two components to check. 

Now, if there were loose components in G(x), we handle them separately before 
calling Algorithm 4: These components are treated simultaneously and in constant 
time during a specific "updating" phase right after the merging of the two chains 
X and Y. This is explained below. 

Merging. We implement the Hwang-Lin algorithm (line 3 of Algorithm 5, for a 
description see Section 7.3 or the original paper [Hwang and Lin 1972]) so that 
its complexity is proportional to the number of comparisons it performs, plus the 
number of cut-points discovered. This is possible because none of the vertices in A 
or B is moved. Each cut-point is copied to the output chain as soon as it is found 
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(more details are given below). 

Updating After a Merging. After the chains X and Y are merged, they are both 
split in at most three intervals: X\ contains the vertices in X that are ranked below 
all vertices in Y in the merged chain, X 2 contains the vertices in X that are ranked 
between two vertices in Y in the merged chain and X 3 contains all the other vertices 
in X, that is, all those that are ranked above all vertices in Y. The intervals Yi, 
Y 2 and Y 3 are defined similarly. Obviously, either X\ or Y\ is empty, and the same 
holds for X3 and Y3. 

The vertices of the middle intervals X 2 and Y 2 become cut-points and are thus 
copied in the output chain C. Some extra vertices in X\, Y\, A3 or Y3 may also 
become cut-points. This is determined by inspecting the neighborhoods of at most 
four vertices in the components of G(x) that touch the component K. More pre- 
cisely, letting J and L denote the components of G(x) that touch K and directly 
precede or follow K, respectively (possibly, J or L is not defined), then we only 
have to inspect the last vertices of J n A and J n B and the first vertices of L n A 
and LnB. 

The above information, which describes the precise way in which components of 
G and G{x) change, can be obtained during the merging, essentially at no extra 
cost. Knowing it, we can update the list of components of G, and the lists of 
components of G(x): the component of G containing K is typically split in two 
components, K is deleted, and the components of G(x) that touch the component 
K are updated, as is explained in the next paragraph. 

The vertices in X\ U Y\ that do not become cut-points (if any) are incorporated 
in the component J (these vertices exactly correspond to the loose components of 
G'(x) that touch K), and the vertices in A3 U Y 3 that do not become cut-points 
(if any) are incorporated in the component L (these vertices exactly correspond to 
the loose components of G'(x) that touch L). 

Thus we do not implement lines 4-7 of Algorithm 5 as is, but we rather process all 
loose components simultaneously, and then continue the rebalancing step normally. 
As said previously, a similar remark is in order for lines 8-12 of Algorithm 5: we 
actually copy cut-points in the output chain as soon as possible. 

The possible evolution of the components after a merging is shown in Figure 4. 
We have illustrated three cases: (i) J and L have the same color as K, (ii) only L 
has the same color as K, (iii) both K and L have different colors. The only edges 
with at least one endpoint in K that may be present in G' are displayed in the 
figure. Portions of the chains shown in gray depict vertices that become cut-points. 

Because the number of operations, when the operations necessary for merging 
pairs of chains or discovering cut-points is put aside, is linear in the number of 
components that G(x) initially had, and each such operation takes constant time, 
we infer the following result, that is crucial to the next section. 

Lemma 18. Algorithm 5 can be implemented so that its running time is 0(q) + 
0(n), where q is the number of comparisons performed. 

Lemmas 17 and 18 together imply that Algorithm 6 can be implemented so that 
its running time is 0(n 2 log 2 n). 
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Fig. 4. Evolution of the components after a merging: three main cases. 

8. REDUCING THE SORTING COMPLEXITY 

Recall that the preprocessing complexity is the number of operations performed 
before the first comparison, while the remaining operations account for the sorting 
complexity. Our goal in this section is to provide an algorithm whose sorting 
complexity is 0{\oge{P)) + 0(n). By confining the entropy computation in the 
preprocessing phase, we are able to reuse the result of this preprocessing to sort 
any other instance with the same partial information. 

The main idea of Algorithm 7 is to compute a minimum entropy point a: of a 
bipartite graph G that can be defined before the sorting phase, solely on the basis 
of the initial partial information P. The following lemma shows that the entropy 
of x provides enough information to guide the sorting phase. 

Lemma 19. Algorithm 7 is an algorithm for the problem of sorting under partial 
information with query complexity at most 15.09 log e(P). 

Proof. The proof is the same as that of Theorem 4, but where the query com- 
plexity of Algorithm 5 for merging under partial information is now at most 3nH(x) 
from Lemma 16. 

Since G is a subgraph of G(P), its entropy is at most that of G(P). Combining 
this with Theorem 2, we get: 

3nH{x) < 3nH{P) < 61oge(P). 

Hence, following the same reasoning as in Theorem 4, we obtain that Algorithm 7 
has query complexity at most (9.09 + 6) loge(P) = 15.09 log e(P). □ 

8.1 Preprocessing 

The preprocessing phase involves computing the entropy of a convex bipartite graph 
G. This can be done in 0(n 2 log 2 n) time, see Lemma 17. 

During this phase, we also compute the function / that associates to each interval 
[c, d] of the chain A, the maximum M of x u over all u G [c, d] , together with the 
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Algorithm 7 Sorting under Partial Information with reduced sorting complexity 

l: {Phase 1 (preprocessing)} 
2: find a maximum chain A Q P 

3: let G be the bipartite spanning subgraph of G(P) containing all the edges 

between A and P — A, and no other edge 
4: compute a point x € STAB(G) with H(x) minimum 
5: compute the function / {(see below)} 

6: compute a greedy chain decomposition and the corresponding Huffman tree for 

the poset P — A 
1: {Phase 2 (sorting)} 

8: apply Algorithm 2 to the poset P — A, yielding a chain B 
9: update the graph G 

10: compute the components of G(x) and eliminate inlays by improving x 
11: handle loose vertices of G{x) 
12: rebalance x using Algorithm 4 

13: if the contribution of the red components to H(x) exceeds that of the blue 

components then 
14: exchange the chains A and B 
15: end if 
16: call Algorithm 5 
17: return the resulting chain C 



largest interval [d , d'] C [c, d] such that x c > — Xd< — M. This can be done in 0(n 2 ) 
time and space using a straightforward dynamic program. 

8.2 Sorting 

We now have to show that the sorting complexity is 0(loge(P)) + 0(n). Thus all 
operations of the sorting phase of Algorithm 7 have to be implemented with a 0{n) 
overhead. The main issues in that respect are the complexities of lines 9-12. 

Updating the Graph G. In line 9 of the algorithm, we modify G so that it becomes 
the incomparability graph of the partial information we have right after the chain 
B has been computed. This is an update, in the sense that the new graph G will 
be a spanning subgraph of the old one. 

To perform this update in linear time, we make use of the structural observations 
of Lemma 8. In particular, property (iii) of this lemma allows us to recover the 
incomparability interval of every vertex in A (B) by scanning twice the chain B 
(A, respectively). More precisely, we first scan the chain A from bottom to top 
and find, for each vertex in A, the lower endpoint of its incomparability interval in 
B. These endpoints arc increasing; hence, this does not require any backtracking 
in B. A second scanning from top to bottom yields the upper endpoints. The 
incomparability intervals for vertices in B are computed similarly. Therefore, the 
whole graph G can be computed in 0(n) time. 

Finding the Components of G(x). At line 10 of the algorithm, we aim at com- 
puting the components of G(x) and encoding them in the data structure described 
in Section 7.5. During this step, we also modify the point x so that inlays in G{x) 
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are avoided. These weight modifications consist simply in increasing x v for some 
vertices v (without loosing feasibility); hence the entropy of x can only decrease 
during this step. 

We proceed by scanning B from bottom to top. First, for every vertex v in B, 
we apply the function / on the incomparability interval [c,d] of v, and obtain the 
corresponding maximum M(v) and an interval [c',d'] C [c,d\. If x v + M(v) = 1, 
then v,d, and d! belong to the same component of G{x) (possibly c' = d'). We 
save such "tight" intervals in a list, together with the corresponding vertices v, and 
forget about the intervals that are not tight. 

Next, we compute in linear time the union of all tight intervals in the list (by 
scanning the list once and merging consecutive intervals when they intersect) . This 
results in a collection of disjoint intervals [ui, u' 1 ] 1 . . . , [m, u' e ] of A. For each such 
interval [ui,u'^\, we can compute in constant time the smallest vertex Vi £ B and 
largest vertex v\ £ B such that the tight intervals of Vi and v\ are included in 
(possibly Vi = v'A. Observe that the intervals [t>i,^], . . . , of B are 

also disjoint. Also, it can be checked that for every i, we have x Vi = x v > = l — x Ui = 
1 — x u >. Moreover, for every v £ we have x v < x Vi — x v >. Similarly, for 

every u £ we have x u < x Ui — x u {. Thus, for every i £ {1, . . . ,£}, we can 

safely update the point x as follows: we increase x v to x Vi for every v £ [u,, v'A, and 
similarly increase x u to x Ui — 1 — x Vi for every u £ [uijU'A. 

The non-trivial components of G(x) (with x updated as above) are exactly given 
by the collection [uj,Mj] U [^i,^] for i £ {1,. . . ,£}. Notice that the components of 
G(x) are now free of inlays (as defined in Section 7.2). 

Handling Loose Vertices. It remains to process vertices that are not incident to 
any tight edge in G(x), but that are not cut-points either (line 11 of the algorithm). 
We again scan B bottom-up, and for each loose vertex v £ B, check the weights 
associated with the non-trivial components touching the component {v}. There 
are at most two such components. We also apply the function / to the interval 
of vertices in A strictly between those components, and within the bounds of the 
incomparability interval of v. This allows us to determine in constant time a slack 
value by which we can increase x v . The vertex v may now be included in a previously 
defined component of G(x), or form a new component with loose vertices of A. 

Afterwards, the remaining loose vertices of A can be eliminated in a similar 
fashion. For those, however, no new component can be created, as there are no 
loose vertices remaining in B. 

It can be checked that the above procedure involves only a constant number of 
operations per vertex. Therefore, the complexity of lines 10-11 is 0(n). 

Rebalancing. The rebalancing step (line 12) involves Algorithm 4. At every itera- 
tion of this algorithm, the number of components of G(x) decreases, hence there can 
be at most a linear number of iterations. Every iteration takes constant time using 
the data structure described in Section 7.5 for the components (this data structure 
can be used because G{x) has no loose components). Thus the complexity of the 
rebalancing step is 0(n) as well. 
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A. GREEDY CHAIN DECOMPOSITIONS 

Any given poset P can be canonically decomposed into "levels" . To construct this 
decomposition, we find the set L\ of minimal elements of P (that is, the elements 
of P without predecessor), then set L 2 of minimal elements of P— L\, and continue 
likewise until we find a set L h such that P — L\ — ■ ■ ■ — L h is empty. The set Lj 
is the zth level of P, and h is the height of P. By construction, every element of 
Li has a predecessor in Li-i, for i = 2, . . . , h. Thus P contains a chain of size h. 
Because each level is an antichain, the maximum size of a chain in P is precisely h. 

The levels of a poset P of order n can be found in time 0(n 2 ). If, while con- 
structing the levels, we record for each vertex in a level Li with i > 2 one of its 
predecessors in the previous level a maximum chain of P can be then found 

in time 0(h). 

Proposition 1. There is a 0(n 25 ) algorithm finding a greedy chain decompo- 
sition of any poset of order n. 

Proof. We assume we know all the relations of P. If needed, we compute a 
transitive closure in time 0(n w ), where to is any real such that any two nxn matrices 
can be multiplied by performing 0(n u ) arithmetic operations, e.g., co = 2.376. 

While the height of P exceeds y/n, we repeat the following steps: build the 
decomposition of P into levels from scratch, find a maximum chain C in P, record 
C and remove C from P. This first phase takes 0(^Jnn 2 ) = 0(n 25 ) time. 

Now the height of P is at most y/n. We continue as before except that rebuilding 
the levels each time from scratch, we update them. To this end, we maintain for 
each element v of P a table of predecessors. Suppose v lies in level Lj. Then the 
jth entry of the table gives the list of predecessors of v lying j levels down, in level 

Updating the levels is done as follows. First, for each element u of the chain C, 
we delete u from P and update the table of predecessors of every successor of u. 
We mark every element neP-C such that the first component of the predecessor 
table for v becomes empty. Second, for i = 1, . . . , h, we process the ith level Lf. For 
each element u that is marked, we determine the minimum index j such that the 
jth component of the predecessor table for u is non-empty, move u in level Lj-j, 
update the predecessor table for u and the predecessor table of every successor v of 
u. Again, we mark every element v such that the first component of the predecessor 
table for v becomes empty. 

In order to analyze the algorithm, we assign to each relation of P a "score" . The 
score of u v is i + j, where i and j are the indices of the levels containing 
u and v, respectively. Initially, the score of each relation is 0(s/n). Each time a 
relation is considered, its score is decreased by at least one. Hence, a given relation 
is considered 0(y/n) times through all the updates. Thus, the second phase of the 
algorithm also takes 0(r?yjn) = 0(n 2 5 ) time. 

Therefore, a greedy chain decomposition can be found in 0(n 2 5 ) time. □ 

B. COMPUTING THE ENTROPY OF CONVEX BIPARTITE GRAPHS 

Proof of Lemma 17. Let A, B denote a bipartition of the vertices of G. With- 
out loss of generality, G is A-convex, that is, there is a linear ordering on the vertices 
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in A such that the neighborhood of every vertex of B is an interval in A. 

We explain how to implement one iteration of the method of Korner and Marton 
[1988] described in Section 7.1. As prcccdingly, we denote by G' the current graph, 
and by A' , B' its current bipartition. Thus G' is ^'-convex. 

Vertices in A' that are isolated in G' are dealt with first and separately. Thus, 
we may assume that no vertex in A' is isolated in G' . Similarly, we may assume 
that A' is nonempty 

First, the algorithm determines the maximum ratio (7) achievable by a subset 
Ai C A', by bisection. Let 



denote the guessed ratio, with /3, a <G {1, ...,n}. Since there are 0(n 2 ) possible 
ratios, the number of guesses is O(logn). Next, we prove that we can decide in 
0(n log n) time whether there exists a subset Ai C A' whose ratio is larger than p, 
or whether no such subset exists. 

Consider the network D obtained from G' by directing all its edges from A' to 
B' , adjoining a source vertex s sending a directed edge to each vertex of A' , and 
a sink vertex t receiving a directed edge from each vertex of B' . The capacities of 
the directed edges incident to s (resp. t) are set to a (resp. (i). The capacities of 
the other directed edges are set to oo. Because the s-t cut defined by {s} U A' has 
capacity a|A'|, two cases can occur: cither the minimum capacity of a s-t cut in D 
equals a\A'\ (case (i)), or it is less than a\A'\ (case (ii))- 

We claim that there exists a subset Ai C A' such that the ratio (7) is larger than 
p if and only if case (ii) arises. Indeed, if such a subset Ai exists then the capacity 
of the cut defined by {s}\jA l U N G ,(Ai) equals a(\A'\ - + P\N G ,(Ai)\, which 
is less than a\A'\ because (7) is larger than (3/ a. Conversely, if case (ii) arises then 
consider a minimum s-t cut defined by {s} U X UY, where X C A' and Y C B' . 
By minimality of the cut, it follows that Y = Nqi (X). Because the capacity of the 
cut is less than a\A'\, we conclude that 



The claim follows. By the max-flow min-cut theorem, case (ii) arises if and only if 
the maximum value of a s-t flow in D is strictly smaller than a|A'|. 

Computing a maximum s-t flow in D amounts to computing a maximum b- 
matching in the convex bipartite graph G' , where the weights of the vertices are 
defined as b u := a whenever u e A' and b v :— (3 whenever v <G B' . This can be 
done in 0(n log n) time by adapting Glover's algorithm for computing a maximum 
matching in a convex bipartite graph [Glover 1967] to the weighted case, and using 
a heap for storing vertices of B'. 

Second, once the maximum possible value of the ratio (7) is determined, we 
seek a maximizer Ai C A'. This amounts to converting the last maximum s- 
t flow computed during the bisection into a minimum s-t cut. Because case (i) 
arises, the value of the flow equals a|^4'|. Hence, all the directed edges incident 
to s are saturated. If all the directed edges incident to t are also saturated, then 
a\A'\ = (3\B'\ and we may take Ai := A' . Otherwise, we perform a BFS from t 
in an auxiliarly network obtained from D by deleting all saturated directed edges, 



p := 0/a 




\N G ,{X)\ 
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reversing all non-saturated edges (in particular, all the edges from A to B) and 
adding all the directed edges (u, v) from A to B that carry a nonzero flow. Because 
G is A'-convex and the support of the maximum s—t flow in D constructed by 
Glover's algorithm is of size 0(n), we can perform the BFS in O(nlogn) time. We 
then define Ai as the vertices of A' that cannot be reached from t in the auxiliary 
network. The lemma follows. □ 



